-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dynamic extraction of a parameter attribute #3143
base: main
Are you sure you want to change the base?
Add dynamic extraction of a parameter attribute #3143
Conversation
9a9d21f
to
ce33ec7
Compare
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
ce33ec7
to
5ebb3e0
Compare
e3117d7
to
ba048b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, left some comments
thanks for splitting the code in multiple logical changes, it's rare ;-)
we were originally thinking of using C expression parsing (there's some go lib for that) but I think that can be done later if it's ever needed, the basic parsing you did should be fine
please add tests, would be great to have some framework where we could easily add various 'ExtractParam' expressions for testing
ba048b3
to
1fb7dbc
Compare
753fd10
to
2b1153f
Compare
This commit introduces the `struct config_btf_arg_depth`. It appends `btf_argN` to `struct event_config` as an array. This array stores the path to the searched data. Any `btf_argN` can have a list of elements as follow : file->f_path is 152 bytes, so the array will look like [{ offset: 152, is_pointer: 0, is_initialized: 1 }, ...]. The max value `MAX_BTF_ARG_DEPTH` as been set arbitrary as the verifier need a fixed size. In config_btf_arg, is_pointer and is_initialized are u16 because it must match padding of 64 bits long structure Signed-off-by: Tristan d'Audibert <[email protected]>
2b1153f
to
a04e634
Compare
…_argN This commit introduces the “extract_arg_depth” function and loops over it to move into the “arg” buffer of config->btf_argN[i]->offset by iteration. The goal is to reach the requiered data by overwriting over arg with the new value. Signed-off-by: Tristan d'Audibert <[email protected]>
This commit checks if btf.Type exists in Tetragon existing types. For instance: `struct file` with btf is called `file` and also exists in `GenericStringToType` with the same alias. However, the attribute `name` in `struct qstr` has a returned type `unsigned char` wich does not exist yet in `GenericStringToType`. The same thing happened with `linux_binprm->filename` as the type is `char` Signed-off-by: Tristan d'Audibert <[email protected]>
a04e634
to
058c828
Compare
FindNextBtfType function recursively searches in a btf structure in order to find a specific path until it reaches the target or fails. The function also searches in embedded anonymous structures or unions to cover as much use cases as possible. For instance, mm_struct has 2 fields; anonymous struct and another type. But you are still able to look into the anonymous struct by specifying a path like "mm.pgd.pgd". For instance, if the search is in the linux_binprm structure and the path is `file.f_path.dentry.d_name.name`, the following actions will be done. - Look for the variable name `file` inside `linux_binprm`. - If it matches, it stores the offset from linux_binprm where the `file` variable could be found. - Then it takes the btf type `file` and searches for a parameter named `f_path`. Signed-off-by: Tristan d'Audibert <[email protected]>
In order to read the data properly on BPF side, integer/long values must use `bpf_probe_read`. So now, every time the latest type retrieved is defined as an integer, it will be safely read before accessing the data. Signed-off-by: Tristan d'Audibert <[email protected]>
This commit adds 2 parameters to give the ability to the user to search for a specific variable following a "path" as follow: ```yml ... args: - index: 0 type: "linux_binprm" extractParam: "file.f_path.dentry.d_name.name" overwriteType: "string" ... ``` The above config can be used to extract a specific parameter from the structure at index 0. Signed-off-by: Tristan d'Audibert <[email protected]>
The OverwriteType parameter should be deleted if `argSelectorType` can use the `EventConfig` structure to search for the correct Type. Signed-off-by: Tristan d'Audibert <[email protected]>
Searches if every user defined type with ExtractParam exists as a BTF type and stores its corresponding offset in ConfigBtfArg. This function does a basic split on `ExtractParam` to obtain the "path" to the required data. Then, the array is given to `btf.FindNextBTFType` to find the offset of each element until we reach the required data. The output is stored in EventConfig to keep the normal behaviour. For example, if the arg 0 is `struct linux_binprm` and ExtractParam is set to `file.f_path.dentry.d_name.name`, the output will give an array of all the offsets from their parents as follows : [{ offset: 96, is_pointer: 0 }, { offset: 152, is_pointer: 1 }, ...] Signed-off-by: Tristan d'Audibert <[email protected]>
This commit updates `addLsm` function to use the `ExtractParam` and `OverwriteType` in order to look for the attributes in BTF structure. Signed-off-by: Tristan d'Audibert <[email protected]>
…probes As BTF types are not defined for Uprobes, their offsets can't be found in BTF file. Thus, with this commit, if the user defines ExtractParam / OverwriteType, their are ignored and a warning is displayed Signed-off-by: Tristan d'Audibert <[email protected]>
Add very similar code as in `genericlsm.go` file to handle ExtractParam feature. Signed-off-by: Tristan d'Audibert <[email protected]>
This commit adds 3 tests for FindNextBtfType algorithm. The first is `testAssertEqualBtfPath` to assert that a specific path has the exact same btfConfig as expected. The second, "testAssertPathIsAccessible" tries to reach the path and asserts that no errors are raised. The third test "testAssertErrorOnInvalidPath" asserts that the error messages raised if the path is incorrect. The chosen test cases have embed union/anonymous structs. The important thing to notice in case of adding new tests is to be aware of btf changes on different architectures. Especially for `testAssertEqualBtfPath` where Offset could be different. To Test locally : ``` go test -exec "sudo" ./pkg/btf/ ``` Signed-off-by: Tristan d'Audibert <[email protected]>
This test aims at testing the ExtractParam feature. Tetragon use hard-coded types instead of BTF. So currently, the ExtractParam feature does not allow to extract attributes from other types than the few hard-coded in `generictypes.go`. For instance, if you try to use the feature with task_struct structure, it will fail because it is not yet hard-coded. Signed-off-by: Tristan d'Audibert <[email protected]>
Signed-off-by: Tristan d'Audibert <[email protected]>
Signed-off-by: Tristan d'Audibert <[email protected]>
058c828
to
391c5a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, I left some comments, thanks
@@ -172,6 +185,11 @@ struct event_config { | |||
*/ | |||
__u32 policy_id; | |||
__u32 flags; | |||
struct config_btf_arg btf_arg0[MAX_BTF_ARG_DEPTH]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not follow the user space and have just btf_arg[5][10]
I don't see btf_arg[1-4]
being used anywhere
"fmt" | ||
"strings" | ||
|
||
_btf "github.com/cilium/ebpf/btf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit _
looks strange to me, would preffer ebtf
which we use on some other place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah especially _
has a specific use in Go imports.
return processMembers(btfArgs, currentType, t.Members, pathToFound, i) | ||
case *btf.Pointer: | ||
(*btfArgs)[i-1].IsPointer = uint16(1) | ||
return FindNextBtfType(btfArgs, t.Target, pathToFound, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is called (at beggining) with i == 0
, but you are setting [i -1]
field, could this crash?
Return: The last type found matching the path, or error. | ||
*/ | ||
func FindNextBtfType( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is mostly called like:
lastBtfType, err := btf.FindNextBtfType(btfArg, rootType, partialPath, 0)
which seems confusing because it's called next but return last type,
also always called with zero index
how about we add another top level function like ResolveParam that would
prepare the path array and called local function that would recurse?
@@ -341,6 +341,10 @@ func addUprobe(spec *v1alpha1.UProbeSpec, ids []idtable.EntryID, in *addUprobeIn | |||
return nil, fmt.Errorf("Error add arg: ArgType %s Index %d out of bounds", | |||
a.Type, int(a.Index)) | |||
} | |||
|
|||
if a.ExtractParam != "" || a.OverwriteType != "" { | |||
logger.GetLogger().Warnf("Extracting parameters from Uprobes is not supported, ignoring") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's fail the policy, it's wrong IMO
assert.ErrorContains(t, err, "Attribute 'fail' not found in structure ''") | ||
} | ||
|
||
func TestFindNextBtf(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test crashes in CI, please check
{Offset: 64, IsPointer: 1, IsInitialized: 1}, | ||
{Offset: 168, IsPointer: 1, IsInitialized: 1}, | ||
{Offset: 0, IsPointer: 1, IsInitialized: 1}, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this will fail the same way like TestBuildBtfArgFromLSMPolicy
on offset values?
could you just read expected offsets from BTF?
- index: 0 | ||
type: "linux_binprm" | ||
extractParam: "mm.owner.real_parent.real_parent.comm" | ||
overwriteType: "string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need overwriteType? could we maybe just do something like:
- index: 0
type: "string"
extractParam: "linux_binprm.file.f_path.dentry.d_name.name"
the overwrite stuff might cause headaches later
also maybe rename extractParam
to resolve
? not sure ;-)
cc @kkourt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but fyi it seems it's motivated by this https://github.com/cilium/tetragon/pull/3143/files#diff-3c0b10317f585c0016d0208630c3abbb0e8d39571e91254af88874710903a260R442-R455.
struct config_btf_arg btf_arg1[MAX_BTF_ARG_DEPTH]; | ||
struct config_btf_arg btf_arg2[MAX_BTF_ARG_DEPTH]; | ||
struct config_btf_arg btf_arg3[MAX_BTF_ARG_DEPTH]; | ||
struct config_btf_arg btf_arg4[MAX_BTF_ARG_DEPTH]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not follow the user space part and have just btf_arg[5][MAX_BTF_ARG_DEPTH]
I did not see btf_arg[1-4]
being touched elsewhere..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been pulled-in for the docs change, here's a review. Thanks for adding docs!
"fmt" | ||
"strings" | ||
|
||
_btf "github.com/cilium/ebpf/btf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah especially _
has a specific use in Go imports.
For specific use cases, you may want to extract a specific attribute from the argument. | ||
For instance you have `struct linux_binprm` as first argument and want to filter parent | ||
process name, you can do it as following. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For specific use cases, you may want to extract a specific attribute from the argument. | |
For instance you have `struct linux_binprm` as first argument and want to filter parent | |
process name, you can do it as following. | |
You may want to extract a specific attribute from the argument structure. For instance, | |
you are hooking a function with `struct linux_binprm` as the first argument and want to | |
filter the parent process' name, you can do it as follows. |
@@ -400,6 +400,61 @@ The `maxData` flag does not work with `returnCopy` flag at the moment, so it's | |||
usable only for syscalls/functions that do not require return probe to read the | |||
data. | |||
|
|||
### Advanced usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe be more specific on this one, like "extracting structure fields/attribute", whatever you prefer that describes best this feature.
The above policy will display the parent process name every time the hook is called. | ||
The `extractParam` field is used to reach a specific data into the `struct | ||
linux_binprm`. It is important to set `overwriteType` as well to make sure the | ||
reached data is read correctly (as a string in this case). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above policy will display the parent process name every time the hook is called. | |
The `extractParam` field is used to reach a specific data into the `struct | |
linux_binprm`. It is important to set `overwriteType` as well to make sure the | |
reached data is read correctly (as a string in this case). | |
The above policy will display the parent process's comm every time the hook is called. | |
The `extractParam` field is used to reach specific data into the `struct linux_binprm`. | |
It is important to set `overwriteType` as well to make sure the reached data is read | |
correctly (as a string in this case). |
I would say comm
maybe instead of name since this value can be changed by the thread itself, it's not a really safe way of identifying a process name.
- index: 0 | ||
type: "linux_binprm" | ||
extractParam: "mm.owner.real_parent.real_parent.comm" | ||
overwriteType: "string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but fyi it seems it's motivated by this https://github.com/cilium/tetragon/pull/3143/files#diff-3c0b10317f585c0016d0208630c3abbb0e8d39571e91254af88874710903a260R442-R455.
- This feature requires you to know exactly what you are looking for in the attributes | ||
of the hook parameters. For instance, if you want to have a look on what is | ||
available inside `struct linux_binprm`, take a look at the | ||
[Bootlin website](https://elixir.bootlin.com/linux/v6.12.5/source/include/linux/binfmts.h#L18) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This feature requires you to know exactly what you are looking for in the attributes | |
of the hook parameters. For instance, if you want to have a look on what is | |
available inside `struct linux_binprm`, take a look at the | |
[Bootlin website](https://elixir.bootlin.com/linux/v6.12.5/source/include/linux/binfmts.h#L18) | |
- This feature requires you to read the kernel structure definitions to find what you're | |
looking for in the hook parameter attributes. For instance, if you want to have a look | |
at what is available inside `struct linux_binprm`, take a look at its definition in | |
[include/linux/binfmts.h](https://elixir.bootlin.com/linux/v6.12.5/source/include/linux/binfmts.h#L18) |
- Some structures are dynamic. This means that they may change at runtime. So you need to | ||
be aware of what you are looking for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Some structures are dynamic. This means that they may change at runtime. So you need to | |
be aware of what you are looking for. | |
- Some structures are dynamic which means that they may change at runtime. |
Tetragon can also handle some structures such as `struct file` or `struct | ||
path` and few others. This means you can also extract the whole struct, if it is | ||
available in the attributes of the parameter, and set the type with the correct type | ||
like this : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tetragon can also handle some structures such as `struct file` or `struct | |
path` and few others. This means you can also extract the whole struct, if it is | |
available in the attributes of the parameter, and set the type with the correct type | |
like this : | |
Tetragon can also handle some structures such as `struct file` or `struct | |
path` and a few others. This means you can also extract the whole struct if it is | |
available in the attributes of the parameter, and set the type with the correct type | |
like this : |
```yaml | ||
- index: 0 | ||
type: "linux_binprm" | ||
extractParam: "file" | ||
overwriteType: "file" | ||
# Or | ||
# extractParam: "file.f_path" | ||
# overwriteType: "path" | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```yaml | |
- index: 0 | |
type: "linux_binprm" | |
extractParam: "file" | |
overwriteType: "file" | |
# Or | |
# extractParam: "file.f_path" | |
# overwriteType: "path" | |
``` | |
```yaml | |
- index: 0 | |
type: "linux_binprm" | |
extractParam: "file" | |
overwriteType: "file" | |
``` | |
Or | |
```yaml | |
- index: 0 | |
extractParam: "file.f_path" | |
overwriteType: "path" | |
``` |
Mmh not sure my suggestion is correct but I would split it in two.
The discussion for this PR can be found here #3142
This is currently a draft.
I wanted to start the discussion before submitting the final code, as I think, it is a big enough PR.
Take this PR in the today state as a proof of concept for dynamic parameter extraction. I will continue to work on this PR until the below checks are done.
At the current state, the PR is able to
Description
This PR introduce the dynamic extraction of a custom attribute
Comments
OverwriteType
parameter. But since the functionargSelectorType
does not receiveEventConfig
, it is not possible to search for the type using directly BTF types. So I suggest doing another PR before this one is merged to do so if possible. Then I'll remove the parameter. It is also not possible to add an if condition foruprobes
in this function. So if the user defines the parameter, it will overwrite the type at this moment.u8
to store the offset, as the verifier does not allow me to useu16
. At this moment, I don't know if it is possible to found offset > 255 in BTF structures. For my uses cases, usingu8
was enough.Test the PR
You can use the following config
If you want to test it with more arguments, you can use
bprm_creds_from_file
hook. It hasstruct linux_binprm
andstruct file
which are supported.Release-note